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ABSTRACT 

Estimates of the coefficients a and b of the Fundamental Plane relation R oc a a I b 
depend on whether one minimizes the scatter in the R direction, or orthogonal to the 
Plane. We provide explicit expressions for a and b (and confidence limits) in terms 
r )' of the covariances between logi?, log a and log/. Our expressions quantify the origin 

of the difference between the direct, inverse and orthogonal fit coefficients. They also 
show how to account for correlated errors, how to quantify the difference between the 
Plane in a magnitude limited survey and one which is volume limited, how to determine 
q . whether a scaling relation will be biased when using an apparent magnitude limited 

$-h 1 survey, how to remove this bias, and why some forms of the z ~ Plane appear to be 

\ less affected by selection effects, but that this does not imply that they will remain 

d ■ unaffected at high redshift. Finally, they show why, to a good approximation, the three 

vectors associated with the Plane, one orthogonal to and the other two in it, can all 
be written as simple combinations of a and b. Essentially, this is a consequence of the 

■ fact that the distribution of surface brightnesses is much broader than that of velocity 
tyQ | dispersions, and velocity dispersion and surface brightness are only weakly correlated. 
^f) . Why this should be so for galaxies is a fundamental open question about the physics 

of early-type galaxy formation. We argue that, if luminosity evolution is differential, 
C*") \ and sizes and velocity dispersions do not evolve, then this is just an accident: velocity 

■ dispersion and surface brightness must have been correlated in the past. On the other 

hand, if the (lack of) correlation is similar to that at the present time, then differential 

fSJ . luminosity evolution must have been accompanied by structural evolution. A model 

in which the luminosities of low luminosity galaxies evolve more rapidly than do those 
of higher luminosity galaxies is able to produce the observed decrease in a (by a factor 
of 2 at z ~ 1) while having b decrease by only about 20 percent. In such a model, the 
dynamical mass-to-light ratio is a steeper function of mass at higher z. Our analysis is 
more generally applicable to any other correlations between three variables: e.g., the 
color-magnitude-cr relation, the luminosity and velocity dispersion of a galaxy and the 
mass of its black-hole, or the relation between the X-ray luminosity, Sunyaev-Zeldovich 
decrement and optical richness of a cluster, so we provide IDL code which implements 
these ideas. And, for completeness, we show how our analysis generalizes further to 
correlations between more than three variables. 

Key words: methods: analytical - methods: statistical - galaxies: formation - galaxies: 
fundamental parameters 



1 INTRODUCTION 

Early-type galaxies do not fill the full three dimensional 
space defined by size, central velocity dispersion and sur- 
face brightness (usually evaluated at the half light radius). 
Rather, they define a relatively thin manifold which has 
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come to be called the Fundamental Plane (e.g. Djorgovski & 
Davis 1987; j0rgensen et al. 1996; Pahre et al. 1998; Bernardi 
et al. 2003; j0rgensen et al. 2006; Bolton et al. 2008; Hyde 
& Bernardi 2009b). 

The Fundamental Plane is usually written as 

Re , O b fie t 

l0g10 Wc = a 10810 " %5 m^i + C > (1) 

where R e is the half light radius, a is the velocity dispersion 
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(typically corrected to an aperture of R e /S), and [i e is the 
surface brightness within R e . The coefficient a is loosely ref- 
ered to as the 'slope', and c is the 'zero-point'; it is simply 
c = (log 10 R) — a (log 10 a) + 0.46 {fi e ) . The shape of the Fun- 
damental Plane is determined by estimating a and b. The 
values of a and b are thought to encode useful information 
about these objects. This is because the values a = 2 and 
b = — 1 are expected on dimensional grounds if the virial 
theorem holds exactly in the observed variables, and mass 
is linearly proportional to light. 

If a 7^ 2 and/or 6^—1 then the FP is said to be 'tilted'. 
The tilt may be due to a combination of stellar population 
effects, initial mass function variations, and variations in 
the dark matter fraction within R e (e.g. Pahre et al. 1998; 
Bernardi et al. 2003; Bolton et al. 2008; Hyde & Bernardi 
2009b; Graves & Faber 2010). However, the inferred tilt also 
depends on how the parameters a and b were measured. This 
is typically done either by minimizing residuals in the R e 
direction, or in the direction orthogonal to the fit. In gen- 
eral the 'direct' and 'orthogonal' fit parameters are different 
combinations of the mean values of and covariances between 
the variables log 10 R, log 10 a and /x. Moreover, in practice, 
naive estimation of these means and covariances (e.g. sim- 
ply summing over the data without including other weight 
terms) may lead to biases induced by measurement errors 
(these usually affect the covariances) or by selection effects 
(which bias the means and the covariances). The effects of 
both must be accounted-for to estimate the intrinsic shape 
parameters a and b (e.g. Saglia et al. 2001). This is espe- 
cially important when the FP is determined for galaxies in 
a magnitude limited sample (Bernardi et al. 2003). 

The main goal of this paper is to provide analytic ex- 
pressions which describe the Plane for both the direct, in- 
verse and orthogonal fitting procedures which show clearly 
how to account for measurement errors and selection effects. 
In addition, by providing analytic expressions for all quan- 
tities of interest, our results remove the need for numerical 
nonlinear minimization methods for obtaining the best-fit 
coefficients. Our analysis is complementary to that in Saglia 
et al. (2001), who provide an excellent description of the key 
differences between the different fitting procedures. When 
we illustrate the results of our analysis, the numerical val- 
ues we use come from the SDSS-based early-type sample 
compiled by Hyde & Bernardi (2009b). 

The discussion above has focussed on the direction of 
the smallest scatter. If we think of the Plane as being defined 
by three orthogonal vectors, one orthogonal to the Plane 
and the others in it, then the parameters a and b describe 
the vector which is orthogonal to the plane. If A3 denotes 
this vector, and the other two vectors (in the Plane) are Ai 
and A2, then Saglia et al. (2001) showed that these three 
eigenvectors are well-approximated by 

A3 = r ~ dorth V — frorth i 

A 2 « r+ (1 + b ^ ) v-b OIth i 

Oorth 

Ai « r+b~] h i, (2) 

where r, v, and i denote unit vectors in the size, velocity dis- 
persion and surface brightness directions. Although Saglia et 
al. justified these scalings using numerical experiments, we 
show, in Section [2l that this form follows from the fact that 



the distribution of surface brightnesses is much broader than 
that of velocity dispersions. 

Section [2] also shows that many of the properties of the 
z = Fundamental Plane can be understood as arising 
from the fact that surface brightness and velocity disper- 
sion are almost uncorrelated at z = 0. In Section[3]we argue 
that, in models of pure luminosity evolution, this is only 
a coincidence: the two were correlated in the past. A final 
section summarizes our conclusions and discusses why mea- 
surements of this correlation in high-z datasets will provided 
interesting constraints on models. 

In an Appendix, we provide a description of how the 
FP coefficients differ between magnitude limited and vol- 
ume limited samples, when the underlying pairwise scaling 
relations are linear. Although there is now growing evidence 
for curvature in these relations (e.g. Bernardi et al. 2007a; 
Lauer et al. 2007; Hyde & Bernardi 2009a; Bernardi et al. 
2011), we feel our expressions are useful since the curva- 
ture is usually due to a small fraction of the objects in the 
tails of the distribution. Moreover, our expressions are gen- 
erally applicable to any study of three observables - not 
just those associated with the Fundamental Plane. It may 
be that the assumption of no curvature is more accurate for 
some of these other scaling relations. Some examples include 
the joint distribution of the luminosity and velocity disper- 
sion of a galaxy and its color or the mass of its black-hole 
(Bernardi et al. 2005; Bernardi et al. 2007b), or the rela- 
tion between the X-ray luminosity, SZ-signal strength and 
optical richness of a cluster. 



2 ANALYTIC DESCRIPTION OF THE 
FUNDAMENTAL PLANE 

The analysis which follows is actually the restriction to a 
special case of the following general statement. Since the 
general case is also of interest in these glorious days of large 
panchromatic datasets, we state it first. 



2.1 Conditional correlations between N variables 

Suppose we have N observables which are distributed fol- 
lowing a multivariate Gaussian distribution having means 
Hi and covariance matrix Cat. Suppose that we split them 
up into two sets, A with n observables and B with the other 
N — n. Let ha and Caa denote the mean vector and covari- 
ance matrix of set A, and similarly define hb and Cbb for 
set B. Then the distribution of Oa = {X\, . . . ,X n } given 
that Ob = {X n +i, . . . , Xn} is known, is multivariate Gaus- 
sian with mean 

Ha\b = (Oa\O b ) = ha + CabCb B {Ob - Hb), (3) 

and covariance matrix 

Ca\b — Caa — Cab C bb Cba- (4) 

In what follows, we will study the special case in which N — 
3 and n = 1. Since this makes Cbb a 2 x 2 matrix, its inverse 
is simple, so the expression above is analytically tractable. 
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Table 1. Coefficients of various fits to the Fundamental Plane 
R oc <j a I b in the r-band sample of about 40000 objects defined 
by Hyde & Bernardi (2009b), after correcting for the magnitude 
limit selection effect. Confidence limits ignore the contribution 
from systematic errors. 





a 


6 


Direct 


1.167 ±0.014 


-0.757 ±0.009 


Inverse 


1.606 ±0.023 


-0.792 ±0.010 


SB 


1.219 ±0.017 


-1.028 ±0.009 


Orthogonal 


1.434 ±0.015 


-0.787 ±0.010 



2.2 Restriction to N = 3 

For our three variables, we will use R, V and I to denote 
log(7?/kpc), log(a/km s _1 ) and log(//(Z/0pc -2 )). Let C de- 
note the real symmetric matrix which describes the covari- 
ances between these three variables: 



(Cn Cm Civ 
Cm Crr Crv 
Civ Crv Cvv 



(5) 



The shape of the Fundamental Plane is completely deter- 
mined by this covariance matrix. Hence, our problem is to 
estimate the coefficients of this matrix in a way which ac- 
counts for selection effects and measurement errors (see Sec- 
tion gjj. 

In what follows, we will provide expressions for vari- 
ous quantities which can be derived from C. Although our 
expressions are general, we will sometimes remark on what 
they imply. In such cases, we will use the values reported by 
Hyde & Bernardi (2009b): 



0.0471 
-0.0313 
0.0038 



-0.0313 
0.0552 
0.0189 



0.0038 
0.0189 
0.0187 



(6) 



where / was measured in dex (rather than magnitudes). 
In particular, Table [T] summarizes the various values of a 
and b which can be derived from this C, depending on how 
one fits the Fundamental Plane. Note that these coefficients 
are often determined via numerical nonlinear minimization 
schemes. In the following subsections, we provide analytic 
expressions for these parameters, thus eliminating the need 
for such schemes. 

Note that \Civ\ is the smallest element of C. To remove 
the effect of the fact that the rms of I is much larger than 
that in R or V (and depends on whether / is measured in 
dex or in mags!), we can normalize all quantities by their 
rms values. If we define 



C X 



\j CxxCy 



and call the resulting covariance matrix 1Z, then 



K 




-0.614 0.128 
1 0.588 
0.588 1 



(T) 



(8) 



This shows that nv is indeed much smaller than tir or vrv- 
surface brightness and velocity dispersion are almost uncor- 
rected. This turns out to be a simple way to understand 
many features of the Fundamental Plane. 



2.3 Accounting for selection effects and 
measurement errors 

In an apparent magnitude limited survey of iV bj objects, the 
mean value of an observed quantity X, X = obj Xi/N hj, 
may be biased from its true mean value (e.g., if the ob- 
servable correlates with luminosity). Fortunately, this bias 
is easily removed by defining, for each object with luminos- 
ity Li, the total volume over which the object could have 
been observed: V ma , x (Li) (e.g. Schmidt 1968). One then uses 
this to define a (normalized) weight 



E ; Kn ax (Z/i) 

and estimates the mean value of X as 



(9) 



(10) 



where the sum is over all the objects in the sample. 

For similar reasons, the covariance between observables 
will also be biased by the selection effect, but this bias can 
be removed by applying the same weight. The covariance 
may also be biased by measurement errors. If we define the 
matrix O to have elements 



Oxy 



[(Xi-iX)) (Yi — (Y) 



and the measurement error matrix £ by 
Exy = ^2 w i {e-xey)i 



(11) 



(12) 



(we have assumed zero mean for the errors, and often, 
(exey)i is assumed to be the same for all objects), then 



e = o-£ 



(13) 



is an unbiased estimate of the intrinsic covariance matrix. 
Notice that each element of C has had the contribution from 
measurement errors to the observed covariance subtracted 
off: Cxy = Oxy — Exy - If this term is not subtracted, i.e., 
if one uses O instead of C in what follows, one will obtain a 
Plane that has been distorted by measurement error. In the 
Appendix, we quantify the bias which results from ignoring 
the V^" a x weight; i.e., of setting w = l/iV bj for all i. 

Some workers like to account for the fact that certain 
measurements are more secure than others by weighting each 
measurement by the inverse of the estimated uncertainty on 
it. In this case, if one defines 



U X Y — 



Ei™;/\/< e xM e y>* 



where 



Ei WiXi/^{e 2 x )i 



then one must also define 



p e E, Wi(e x eY)i/^ {e x }i{e Y )i 



T,i W i/V( e x)'{ e Y)^ 



before estimating 



Cxy ~ O} 



(14) 



(15) 



(16) 



(17) 



as before. In practice, it makes sense to replace 1 / yj (e 2 x ) — > 
l/\Amin + ( e x) l° r some e^ in that is chosen to prevent a 
few well-measured objects from dominating the sums. 
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2.4 The parameters of the direct fit 

If we write the Fundamental Plane as 

R-{R)=a{y-{V))+b(l-{I)), 



then 



adircct 



(Crv /Cvv) — (Cir/Ch)(Civ /Cvv) 
1 - (.dv/CuHdv/Cvv) 

Crv 1 — rivrm/rRv _ 

C V v 1 - rj v 

(Cir/Ch) ^direct (Civ /Cn) 

(Cir/Ch) — (Civ /Cji)(Crv /Cvv) 
1 - (Civ/Cii)(Civ/Cvv) 

Cir 1 - tivtrv jriR 

Cii 1 - r 2 



(18) 

(19) 

(20) 
(21) 
(22) 

(23) 



(Bernardi et al. 2003) . Note that because of how we defined 
our Cxy, these expressions have been corrected for the ef- 
fects of errors, and because of the weighting term Wi, they 
have been corrected for selection effects. 

Equation (|19|) shows that afreet is simply the correla- 
tion between R and V minus the contribution which comes 
from R — I and I — V correlations. Similarly, bdircct is the 
correlation between R and 7 minus the contribution which 
comes from the R — V and I — V correlations. It might 
help to think of these as follows. Let X R \i = R — (R) — 
(Cri/Ch) (7 — (7)) denote the residual in R from the R — I 
correlation. Then (X R]I V) = Crv - (Cri/Ch) Civ- There- 
fore, adircct is the ratio of (Xr^V) to the range of V values 
at fixed I, CVv(l — r 2 v ), so it is the slope of the correlation 
between X R \i and V , at fixed 7. Of course, bdircct can be 
understood similarly. 

The fact that, in the data, neither 

^direct nor Odircct are 
zero implies that both the R—V and I — R correlations are 
fundamental - they are not consequences of other relations. 
Moreover, note that if Civ = (i.e., riv = 0), then adirect 
and bdircct are really just the slopes of the (R\V) and (R\I) 
relations. In addition, if Civ ~ 0, then the Direct fit has 
the convenient property that the errors on the fitted coef- 
ficients adircct and bdircct are independent. We show below 
that Civ ~ turns out to be an easy way to understand 
some properties of the Fundamental Plane. 

This form of the Plane (i.e., the Direct fit) should be 
used if the distance independent quantities V and 7 are used 
to predict the distant dependent one R. The accuracy with 
which R is predicted by 7 and V is limited by the rms scatter 
around this fit, which is (the square root of) 



1 



r 2 
• RV 



r 2 
r IV 



r IR + IruiTivriiv 



(AR dilcct ) — Crr 

j- - ' iv 

(24) 

Confidence limits on adircct and bdircct themselves can 
be obtained as follows. If there were no measurement er- 
rors, then the 68% confidence limits on the best fit val- 
ues adircct and bdirect would be given by the square root of 
(Ai^ rcct )/[AUjCV|/] and (AR 2 dircct ) /[N ohj Ci lv ], where we 
have defined C Y \x = Cyy(l — r 2 XY ) and (A7?di r cct) is given 
by equation (|24l) . Note that the confidence limit on adircct is 
proportional to the scatter around the best fit, (A7?di r ect}> 
divided by the number of degrees of freedom (which is es- 
sentially the sample size), as one might expect. However, 
it is also scales inversely with Cv\i because, as the intrinsic 



spread in V at fixed 7 decreases, it becomes increasingly dif- 
ficult to measure the slope of the R—V relation (at fixed 7). 
Similar arguments apply to bdircct- This means that the un- 
certainty on adircct will be w Cn /Cvv times the uncertainty 
on bdircct, independent of sample size. The errors on these 
best-fitting coefficients are correlated. The correlation is the 
square root of (Ai?dircct) Civ /[N oh - ] C v \iCi\v]\ it is nonzero 
if Civ 0. 

Measurement errors (random, not systematic) decrease 
the precision of these estimates as follows. If Xobs dir = 
Orr + a direct Ovv + bdirectO// — "2a dilcct ORv — 2b divcct OiR + 
2adirect bdirect Oiv (note that this is just the observational 
analogue of equation [24}, then the limits on adircct and 
bdircct are well-approximated by Xobs,dir (O v \i /C V \i) /C V \i 
and xlbs,da(°f\v/ C i\v)/Gi\v, respectively, where Oy lz = 
O yy - 2(C YZ /Czz)0^ z + (C YZ /CzzfO w zz where O yz = 
Ei(™ 2 XYz)i, for (Y,Z) = (V,I) or (7, V) respectively. 

The superscript w is to remind us that O y ^ z car- 
ries an extra weighting factor compared to O y \ z . It may 
be helpful to think of (O y \ z /O y \ z ) as defining an effec- 
tive sample size N Y \ Z . This is because, if all the weights 
are the same then (because our weights are normalized) 
w = 1/iVobj, so (0 Y \z/O y \z) = ^obj. Thus, the factor 
(O y \ z /Cy\z) is really (O y \z /C Y \z)/N Y \z , making the cor- 
respondence with the case in which there were no measure- 
ment errors obvious: one replaces {AR 2 } — > Xobs.dir and 
Cy\z (Cy\z /Oy\z) Cy\z (to account for measurement 



errors) and N 



AT- 



Y\Z 



(to account for the weights). If 



each measurement was weighted by its uncertainty, then all 
Oxy X yj and all X y are given by equation (| 14f) with 
w 2 in the sum in the numerator, but only u>i in the denom- 
inator. 



2.5 The parameters of the inverse fit 

Some authors prefer to keep the spectroscopic quantity V 
as the dependent variable, and so fit 



V - (V) = 



R-{R) 



bin 
"in 



I -(I) 



(25) 



This has some merit, because the measurement of V is often 
much noiser than that of the combination of 7? and 7 which 
defines the Plane (e.g. correlated errors in 7? and 7 when 
fitting to the surface brightness profile mean that 0.3/i — 7? 
is typically determined to within 0.005). If the errors are 
essentially all on V, then they do not bias the coefficients of 
the 'direct' fit to this relation, so one can safely ignore them 
when estimating the coefficients of the fit. 

So, the question arises as to how well (ai nv , bi nv ) approx- 
imate (adircct, bdircct). By simply interchanging 7? and V in 
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the expressions above, one finds 

l-{C IR /Cu){Cm/C RR ) 



(Crv/Crr) — (Civ /Cii)(Cir/Crr) 



Crr 



1 



Crv 1 — TivTirItrv 

(\-r\ R )(\-r 2 lv ) 



^direct 



{rRV - TivTir) 2 

(Civ /Ch) — (Cir/Ch)(Crv /Crr) 



1-(Cir/Cii)(Cir/Crr) 
(Civ jCii) — (Cir/Ch)(Crv /Crr) 
(Crv/Crr) — (Civ /Ch)(Cir/Crr) 
_ , (rmrRv - riy)(l - rj v ) 

^direct / \ s \ •> 

(r R v — rivriR)(ri R — ri V r RV ) 
with rms scatter equal to the square root of 



(AVL) = Cvv - 



r 2 

r RV 



r 2 
r IV 



- r IR + IriRTivTRv 



1 



(26) 
(27) 
(28) 
(29) 
(30) 
(31) 

(32) 



The intrinsic uncertainty on (1/fflinv) is 
{AV? v ) 1/2 /[N obi C Rl i]- 1/2 , and that for (b inv /a inv ) 
is (Crr/Cu) 1 ^ 2 times that on l/ai nv . How- 
ever, the uncertainties on a- lm and &i nv them- 
selves are (AV? V ) 1/2 a 2 nv /[N obi C R - bim i}~ 1/2 where 
Ch-6| W i = Crr — 2b 1Ylv Ci R + b 2 nv Cn, and 
{AV 2 av ) 1/2 a inv /[N ob jCn]~ 1/2 . As before, a good esti- 
mate of the uncertainties in the presence of measurement 
errors and weights comes from replacing (AV^ nv ) — > Xobs invi 
C R - binv i C R - binv i(C'R- binv i/0 R - billvI ) and A obj -> 
0*-w/°«-W for a inv and N ohj Cu -> Cu (Cn/Ofj) 

for fei nv . 

Notice that, in general, a- lnv 7^ afreet and Oi nv 7^ Odirect- 
E.g., if Civ -» then 

1 - r 2 

ai nv . — > ^direct o ' aI1< ^ ^inv ~~ ^ ^direct- (33) 



The determinant of C (the matrix defined in equation O 

2 



t rv > 0, which means |ai nv | > adir 



Thus, although 



binv = fedircct in this limit, aw 7^ adi rcc t- Therefore, the 
temptation to rearrange equation (|25|l so as to use a lnv V + 
hnvl to estimate R should be avoided, as it is guaranteed to 
lead to a bias. In addition to a bias, the associated noise in 
this estimator of R, 

{ARf nv ) = Crr + 

-2b inv CiR + 2a inv b inv Civ, (34) 

is larger than (AJ?di rC ct) ■ 



2.6 The SB fit: Predicting I from R and V 

For completeness (though see Graves & Faber 2010 for why 
this might be an interesting choice), we now give the result 
of fitting the Plane when / is the dependent variable: 



{!) = 



R - (R) ai 



hi 



= (v-<v>). 



In this case, 



ai 



1 — (Crv /Cw)(Crv /Crr) 

(Cir/Crr) — (Civ /Cvv)(Crv /Crr) 
Crv (Cir/Crr) — (Civ /Crv) 



Cvv (Cir/Crr) — (Civ /Cw)(Crv /Crr) ' 
© 0000 RAS, MNRAS OOO.fTlfTTl 



(35) 



(36) 



(37) 



the intrinsic error on ai is (AI 2 ) 1 / 2 bi/[N obs Cvv] 1//2 and on 
61 is {AI?) 1/2 bj[N obi C R - ai v]~ 1/2 , with the usual replace- 
ments to account for measurement errors. 

It is straightforward to verify that, like the inverse fit, 
ai(V — {V)) + bi(I—{I)) is also a biased predictor of R—(R). 
E.g., if nv = 0, then a\ — > aairect but 61 — > 6 d ircct (1 — 

r R v)/rj R S0\h\> Indirect |- 



2.7 The orthogonal fit: Eigenvalues 

The expression for the orthogonal fit coefficients is more 
complicated, since it requires knowledge of the eigenvalues 
and eigenvectors of the matrix C. However, the eigenvalues 
of a matrix are the roots of its characteristic polynomial, 
and, since C is a 3 x 3 matrix, this polynomial is a cubic, so 
the roots satisfy 



A 3 + A 2 TrC + - [TrC 2 - Tr 2 C] + DetC = 0. 



(38) 



This can be solved analytically: since C is real and symmet- 
ric, the roots are 



El 

3 ' 



Ai = -2 

A2 = —2 ^J~Q cos 

A3 = —2 sfQ cos 
where 



cos — 

3 



- 47T 



2tt 



El 

3 ' 

El 

3 ' 



(39) 



(40) 



cos e = P/Q s/2 , 
with 

P = (P2/3) 3 -(pip 2 -3po)/6, 
Q = (p 2 /3) 2 - (pi/3), 

and 

Po = CrrCiv + CvvCir + CuCrv 

—CrrCvvCh — 2CirCivCrv 

= — A1A2A3, 

Pi = CrrC'vv — Crv + CrrCh — C 2 R + CuCvv — C'j v , 

= A1A2 + A1A3 + A2A3 

P2 = -(C rr + Cvv + Cii) = -(\i + \ 2 + \ 3 ) 

(e.g. Section 5.6 of Press et al. 2007). 

If we write the eigenvector associated with eigenvalue 
Ai as 



aiV — bi i, 



(41) 



where r, v and i are unit vectors in the size, velocity disper- 
sion, and surface-brightness directions, then 



Crv /Cvv , Civ/Cvv 
ai = - L — b 



bi 



1 — Xi/Cvv 1 - Ai/Ci/y' 

CirCvv (1 — X%/Cvv) — CivCrv 

~ ■ 

IV 



CnCvv(l — \i/Cn)(l — Xi/Cvv) — C 



(42) 
(43) 



We are particularly interested in the smallest eigenvalue, 
since the square root of it gives the intrinsic rms scatter 
orthogonal to the Fundamental Plane. 

Suppose this eigenvalue is A3. Then the coefficients of 
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the associated eigenvector are given by inserting A3 in the 
expression above. It is conventional to use (a or th, feorth) to 
denote (03,63), so that 



Oorth 



Crv/Cvv , Civ/Cvv 

Oorth 



1 — Xz/Cvv 1 — Aa/Cvv 

CirCvv (1 — Xz/Cvv) — CivCrv 
CuCvv(l - A 3 /Ci/)(1 - Xs/Cvv) - G] v 



(44) 
(45) 



with intrinsic uncertainty well-approximated by 
{AR 2 OTth )^ 2 /(N ohj C vl i) 1/2 and <A J RLh) 1/2 /(^ob j C / |v) 1/2 
with (AR 2 OIth ) = (1 + a 2 OIth + b orth )A 3 . Measure- 
ment errors make these xlbs,onb.( v\l/ C v\i)/C V \i 
and ^bs,ortii( V|x/ c 'j r |v)/Ci|v where Xobs.orth = 
Orr + a 2 rth Ovv + b 2 Ith Ou — 2a orth ORv — 2b ortli OiR + 

2a rth6orthO/V- 

Notice that, in the thin Plane limit, A3 — > 0, so 



Oorth 



CrvCh — CivCir 



CirCvv — CivCrv 



(46) 
(47) 



Comparison with equations (|19H23|I shows that, in this limit, 
the coefficients of the direct and orthogonal fits are the same 
(as they should be). 

When Civ « 1 then 



Oorth 



Crv/Cvv 
1 — Xz/Cvv 



and 6orth 



Cir/Ch 
1 - X3/C11 ' 



(48) 



Since A3 is the smallest eigenvalue, it is smaller than ei- 
ther C11 or Cvv, so the coefficients of the orthogonal fit 
are guaranteed to be larger than those of the direct fit; in 
this limit, this means that they are slightly larger than the 
slopes of the simpler pairwise (R\V) and (R\I) relations. In 
practice, Cvv <IC Cn so this will make a or th > Odirect but 

Oorth ^ ^direct ■ 

These expressions (e.g. equation 1481) make it easy to 
understand the effect of restricting the range of a in the 
sample, as is done in Hyde & Bernardi (2009b). This will 
have the effect of decreasing Cvv, making A3 /Cvv — > 1, 
thus increasing a ort h, but leaving b OIt h essentially unchanged 
(see Figure 8 in Hyde & Bernardi 2009b). 



2.8 The orthogonal fit: Eigenvectors 

Although we concentrated on the smallest eigenvalue and 
its eigenvector, the expressions above are also valid for each 
eigenvalue. Thus, if the largest eigenvalue, Ai, is much larger 
than Cvv, then the associated eigenvector Ai will have es- 
sentially no component in the V direction: ai w 0. When this 
is the case, as it is for most datasets (Ai must be greater than 
Cn and Cn ^> Cvv for most if not all FP datasets), then 
the fact that the three eigenvectors are orthogonal allows 
us to express the coefficients of the other two eigenvectors 
(those in the FP rather than orthogonal to it) as simple 
combinations of o or th and feorth- Namely, A3 • Ai = sets 
bi = — l/fcorthj and then Ai x A3 = A2 sets ai and 62. This 
procedure yields equation lj2"|. illustrating that Cn 3> Cvv 
plays a key role. 



2.9 The FP with normalized variables 

One might argue that the real Plane of interest is the one 
obtained by normalizing all observables by their rms val- 
ues. This means that we are interested in the eigenvalues 
and vectors of 1Z (c.f. equation[8|. The coefficients of the di- 
rect fit become a dircct = (r R v - rmriv)/(l - rj v ) = 0.678 
and fcdircct = (rm — r R vri V )/(l — rj v ) = —0.700. The three 
eigenvalues are 0.081, 1.123, 1.796 and the associated orthog- 
onal fit coefficients are (fJorth; ^orth ) = (0.75,-0.77). 

This Plane is easy to understand if we set riv = (this 
is analogous to our setting Civ /Cn — > 0). Then 



K ! 



1 


riR 





rm 


1 


trv 





Trv 


1 



(49) 



+ r RV w ifh eigen- 



The associated eigenvalues are 1, l±\fr- 
vectors 

A3 = i — + (r R v/ri R ) 2 r + (r RV /ri R ) v, (50) 
A 2 = i- (rm/mv)v, (51) 
Ai = i + y/l + {r R v/ri R ) 2 r + (r RV /ri R ) v. (52) 
Since riR w —t-rv, this reduces further to 

A 3 « i - V2r - v, (53) 
A 2 ~ i + v, (54) 
Ai « i + V2r-v. (55) 

Notice that the equation for this FP is rather different than 
when the observables were not normalized by their rms val- 
ues. 

2.10 When one correlation is due to the other two 

The previous section showed the simplifications which are 
possible if one of the pairwise correlations vanishes. The 
other case of interest is when one of the correlations is 
entirely due to the other two. An example of this is the 
color-CT-luminosity relation: the color-luminosity correlation 
is entirely due to that between color-a and cr-luminosity 
(Bernardi et al. 2005). In this case, 



(56) 



where C, V and L denote color, log(cr) and log(luminosity) , 
sopo = -{r 2 cv ~l){r VL -l), Pl = 3- r% v - r vh - r^ v r VL , 
and P2 — —3. This makes P — —r 2 - ;v r' VL and Q = (r^y + 
r V L + r CT r v r i)/3- Unfortunately, the expressions for the 
eigenvalues and vectors which result are complicated, and 
not very intuitive. 

However, they simplify if rev = Tvt, in which case 
the three eigenvalues are (r 2 CV + rev \/r cv + 8 + 2) /2, 1 — 
r^v, and (r 2 C v — r cv\/r 2 -: V + 8 + 2)/2, and the associated 
eigenvectors are 




Ai 
A, 
A 3 



I + c 
I — c, 
l + c 



(rev 



-4) 



rev 



+ 8 



■ + 3r C v 



{r C v-4)+rcvy/r" cv + l 



+ 8 - 3r C v 



(57) 
(58) 
(59) 
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Unfortunately, this is not so useful for interpretting the 
SDSS data, which have ryx ~ 0.8 and rev ~ 0.5. This 
is one example of where direct analysis of the elements of 
the covariance matrix is more interesting, and provides more 
insight, than analysis of its principle components. 



3 DIFFERENTIAL EVOLUTION EFFECTS 

Our analysis shows that the form of the z = FP is largely 
a consequence of the fact that the distribution of surface 
brightness is much larger than that in velocity dispersion, 
and surface brightness and velocity are almost uncorrelated. 
In passive differential evolution models, in which the lumi- 
nosities of the lower mass galaxies are assumed to evolve 
fastest while sizes and velocity dispersions do not change, 
this is an accident: surface brightness and velocity disper- 
sion should no longer be uncorrelated at z > 0. As a result, 
the coefficients of the FP are expected to evolve. The fol- 
lowing simple example illustrates. 

3.1 Passive luminosity evolution 



Suppose that 



L (l + z) a(Mdyll) 



(60) 



where Md yn oc Ra is the same at all redshifts, and 



4M dyn ) = a* - /3»(M dyn - (Mdyn))- (61) 



The sign has been chosen so that 0* > means mas- 
sive galaxies evolve less rapidly. Then, at redshift z, the 
slope of the relation between log(dynamical mass) and 
log(luminosity) will be 

Ch z M d _ CL„M d 



CM d M d C Md M d 



- P* log(l + Z) 



(62) 



This shows that the slope will decrease at high z if /3» > 
(i.e., if massive galaxies evolve less rapidly). As a result, the 
slope of log(Af dy n/£) at fixed M dyn (which is one minus the 
number on the right hand side of the expression above) will 
steepen at higher z for positive /3*. 

Similarly, although Crv, Crm & and Cvu d do not 
evolve, correlations which involve luminosity do. For exam- 
ple, at redshift z, the correlation between surface brightness 
and velocity dispersion becomes 

C Iz v = C Lz v - 2C RV = C IqV - P* log(l + z)C V M d ; (63) 

since CvM d > 0, we expect Ci z v to have the opposite sign 
to /?*. In particular, for /?* > we expect Ci z v < 0, so equa- 
tion (fT9l) implies that a<iirect(.8) < a d irect(0) if Ci z r/Ci z v > 
Crv/Cvv ■ Since Crv ~ C'vv, this means that we would 
like to know if Ci z r > Ci z v- A little algebra, combined 
with the fact that Ci v ~ 0, C'vv < Crr and Crv ~ Cvv 
shows that a < a Icct (z) < a d i rcc t(0) if /?* > 0. A similar analy- 
sis of equation (|23|) shows that fedircct too decreases with z 
if /3» > 0. However, note that for /3» > 0, the distribution 
of surface-brightnesses widens (i.e., Ci z i z > Cj j ) meaning 
Ci z i z 3> Cvv, so, even though a or th and berth both change, 
equation ([2]) continues to describe the Plane well. 

Notice that, in such models, the evolved values of (a, 6) 
depend on the change in the slope of the mass-to-light ra- 
tio. This is shown in Figure 1, where we have also shown 




0.5 - 



0.2 0.3 
P. log 10 (l + z) 



0.5 



Figure 1. Relation between FP parameters a (solid) and b 
(dashed) and the change in the slope of the dynamical mass-to- 
light ratio in a model in which only luminosities evolve, and this 
evolution depends on dynamical mass at z = 0: massive galax- 
ies evolve less rapidly. Upper (thick) solid and dashed curves are 
for the orthogonal fit; the thinner solid and dashed curves are 
for the direct fit. Filled circle and associated error bar shows the 
measurement of j0rgensen et al. (2006). 



the expected relation for the orthogonal fit coefficients, to 
illustrate that they behave similarly. Though we have not 
shown it here, the intrinsic scatter also changes slightly: If 
we define fi z = /3»log 10 (l + z), then (Adi roct } 1 ' /2 decreases 

1/2 

from about 0.1 at fi z = to 0.07 at p z — 0.5, whereas A 3 
increases from about 0.053 to 0.058. 

For comparison the filled circle shows a measurement 
of these quantities at z ~ 0.85, from j0rgensen et al. 
(2006). (In fact, we have only shown their measurement 
of the change in slope of (Mdyn/L|Mdyn), 0.3 ± 0.08, ver- 
sus their measurement of —b = 0.7 ± 0.07, which is close 
to what we call — b OI th- They also report a = 0.6 ± 0.22, 
which would be displaced slightly downwards on our plot, 
and have substantially larger uncertainties, than the sin- 
gle point we have shown.) Note that their measurement of 
the change in slope implies /3* ss 0.3/ log 10 (1.9) ~ 1.07. 
They also report little change in the thickness of the plane, 
which is consistent with the numbers given above. If this 
is indeed the right picture, then the luminosity function 
at z should be narrower by a factor of Cl z l z /Cl l = 

1 — 2/3 z CL a M d I ClqLq + PzClvI d M d /CL L a ~ 0.5. 

Before we move on, it is worth remarking on the fact 
that differential luminosity evolution changes a more than 
b. Naively, this is surprising, since adircct ~ Crv/Cvv at 
z = 0, so one might have thought it would not be changed 
at all if neither R nor V change. Moreover, one might have 
expected b to change, perhaps strongly, because the luminos- 
ity evolution would change both Cir and Crr. To see why b 
changes only weakly, note that /3« > means that the dis- 
tribution of L was narrower at high z. In the limit in which 
all objects have the same luminosity Cir/Cii = —1/2; 
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thus, differential evolution cannot force |6| below 1/2. Since 
\b\ — 0.8 at z = 0, and it cannot become smaller than 1/2, 
the evolution in b is weak. Thus, our analysis shows that a 
is more strongly affected than b because luminosity evolu- 
tion makes Civ 7^ at higher z, and because differential 
evolution makes the distribution of L narrower in the past. 



3.2 Selection effects and structural evolution 

While consistent with the measurements, pure (differential) 
luminosity evolution is not required by them. For example, 
the expected form of this evolution implies a narrower dis- 
tribution of L at high redshift. Since a magnitude limited 
selection effect would also produce a narrower distribution 
of L, one must first be sure that this is not producing the 
observed changes in a and b. In particular, Figure 7 in Hyde 
& Bernardi (2009b) shows that removing faint galaxies from 
the z — sample decreases a and |6|. Since this is qualita- 
tively the same as the change in the FP coefficients between 
2 = and z = 0.9, statements about differential evolution 
should only be believed if accompanied by measurements of 
a change in the slope of the size-L and a — L relations - the 
FP itself is a very bad diagnostic. 

Moreover, the analysis above assumes that only the lu- 
minosities evolve. However, there is much recent discussion 
of the fact that, at fixed stellar mass, galaxies appear to 
be more than three times smaller at z ~ 2 than at z ~ 
(e.g. Trujillo et al. 2006; Cimatti et al. 2008; Van Dokkum 
et al. 2008) although the evidence is not uncontested (e.g. 
Mancini et al. 2010; Sarocco et al. 2010). Indeed, Saglia et al. 
(2010) interpret their measurements of the evolution of the 
Fundamental Plane entirely in terms of structural evolution, 
rather than differential evolution of luminosity! 

At fixed Mdyn, they find that the sizes are slightly 
smaller and velocity dispersions slightly larger at z ~ 0.8 
than at z ~ 0. While the redshift dependance they report 
is in quantitative agreement with that derived by Bernardi 
(2009) from a substantially larger dataset restricted to a nar- 
rower redshift range (z < 0.3), we must again worry about 
selection effects on these estimates of structural evolution. 
For example, suppose that the evolution was purely in the 
luminosities, and it was not differential, but the high-z mea- 
surements only see the largest L. Then because both R and 
M^n correlate with L, the R — Md yn relation will be biased 
by this selection on L (even though L does not enter ex- 
plicitly in the (i£[Md yn ) relation). In addition, relating the 
high-z measurements to those at z = requires a better un- 
derstanding of the systematic differences in band-passes, of 
how the velocity dispersion measurement at high-z relates 
to the one at z — (e.g., effective aperture effects), and 
of whether or not the high-z population really is made up 
of the progenitors of the z = population. Exploring this 
further (e.g. How should one account for the fact that the 
youngest members of the z = population simply did not 
exist at z ~ 1? What role do mergers play?), in the con- 
text of differential evolution models, is the subject of work 
in progress. 



4 DISCUSSION 

We started from a general expression for the conditional 
distribution of n correlated variables when N — n other vari- 
ables are known (equations [3] and [4]), and specialized to the 
case N — 3. This provided analytic expressions which de- 
scribe the Fundamental Plane associated with three corre- 
lated variables. Our expressions allow one to see why the 
coefficients of the direct, inverse and orthogonal fits differ 
(equations \TWZ3[ [2"M3T1 144H45I and Tabled]); how to esti- 
mate the uncertainties on these coefficients; why the three 
eigenvectors which describe the FP have the form they do 
(equation [2] and Section [2.8|l ; and to see how and why the 
Fundamental Plane in a magnitude limited survey will, in 
general, differ from that in a complete sample (Appendix). 

If one views all pairwise correlations as having a compo- 
nent that is due to the individual correlations between each 
observable and luminosity, and another component which is 
not, then our analysis shows that only the part which is not 
due to the correlations with luminosity remains unaffected 
by the magnitude limited selection: the other part is biased 
(e.g., equation I A7|) . Our analysis also shows how to remove 
this bias, as well as account for measurement errors. By pro- 
viding analytic expressions for all quantities of interest, our 
results remove the need for numerical nonlinear minimiza- 
tion methods for obtaining the best-fit coefficients. These 
results were used by Hyde & Bernardi (2009b) in their anal- 
ysis of the SDSS Fundamental Plane. 

Many properties of the Fundamental Plane at z — 
can be understood as arising from the fact that surface 
brightness and velocity dispersion are uncorrelated. This 
raises the question of whether or not this lack of correla- 
tion encodes something fundamental about the physics of 
galaxy formation. Recent work suggests that the coefficients 
of the Fundamental Plane at z = 0.8 are significantly dif- 
ferent from those at z = (di Serego-Alighcri et al. 2006] 
|j0rgensen et al. 2006p . We showed that, in models where 
massive galaxies evolve less rapidly than low mass galaxies, 
but there are no changes to the size or velocity dispersions, 
there is a one-to-one relation between the changes to (a, b) 
and the correlation between luminosity and mass (Figure [T]). 
(We also showed that, even though (a, 6) change, the rela- 
tionship between the eigenvectors of the Plane (equation 0) 
does not.) This relation, which is in reasonable agreement 
with the measurements, also predicts that Civ 7^ at higher 
z. I.e., in this model, Civ — at z = is just a coincidence. 

While consistent with the FP measurements, pure (dif- 
ferential) luminosity evolution is not required by them. E.g., 
a selection effect on luminosity will produce qualitatively 
similar changes to a and b, making the FP a very bad diag- 
nostic of this sort of evolution; the size-L and a — L relations 
are much better. Moreover, other scaling relations suggest 
there has been substantial structural evolution since z ~ 1. 
Again, selection effects complicate the relationship between 
the observed changes to a and b, and the structural evolution 
parameters. Accounting for these is the subject of work in 
progress, but we note that if Civ remains small even at high 
z, then this will provide a simple way to constrain models 
of the structural changes that complement differential lumi- 
nosity evolution. 
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APPENDIX A: BIASES FROM THE 
FLUX-LIMITED SELECTION EFFECT 

The discussion in the main text can be worked through for 
the case of an apparent magnitude-limited survey in which 
one does not weight objects by (the inverse of) Vma,x{L). 
In essence, all one must do is determine the change to the 
elements of the covariance matrix if all objects have the same 
weight. Although the main text worked with luminosity in 
solar units, rather than absolute magnitudes, the analysis in 



this Appendix uses magnitudes. We use M oc — 2.51og 10 (L) 
for absolute magnitude - it should not be confused with Md 
in the main text, which we used for dynamical mass - and 
so now surface brightness is I oc M + 5R. 



Al Quantifying the bias 

If we use X and Cxy to denote the means and (error- 
corrected) covariances in the observed sample (i.e. equa- 
tions [10] and [13] with u>i = 1 for all i), then the fact that 
Cxy 7^ Cxy for all pairs XY means that the coefficients of 
the Fundamental Plane are sensitive to selection effects, so 
care must be taken when estimating its shape. When there 
is no curvature in the underlying pairwise scaling relations, 
then this is straightforward, as we show below. In essence, 
all that is really required is an estimate of how the mean and 
the width of the observed luminosity distribution is affected 
by the magnitude-limited selection. 

For example, the differences between the selection- 
biased and intrinsic mean values are given by 



R - (7?) = 
V-(V) = 



-^-(M-(M)Y I = M + 5R, 
£pL(M-{M)\ (Al) 



where (M) etc. denote the true mean values (i.e., those in 
which the selection effect has been accounted-for) . Similarly, 
the selection-biased covariances are 



Crm ^ Cvm ^ 

<~-RM = — Oa/M, OVM = — OMAf, 



Crr 



Cmm 
Crr 



Cmm 



cl, 

CI 
r 2 

^VM 



r 2 

^MM 



Cmm) 



Cvv = Cvv + — — (Cmm — Cmm) , 
C » 



Crv 



Crv + 



MM 

Crm Cvm 
CP 

M M 



(C* A 



Cmm) 



(A2) 



from which one can compute 
Cim — Cmm + 5Crm, 

Cir — Crm + 5 Crr, Civ = Cvm + 5 Crv , 

On = Cmm + 10 Crm + 25 Crr. (A3) 

This shows that scaling relations at fixed M are not affected 
by the selection effect: Crm /Cmm = Crm /Cmm etc. For 
the other relations, the differences from when weight- 
ing is used depend on how different Cmm, the variance in 
the observed luminosity distribution, is from the intrinsic 
variance, Cmm- This difference will differ from one sample 
to another: we will quantify it for the SDSS sample shortly. 
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A2 Correcting the bias 

These expressions can be rearranged to express the correct 
intrinsic correlations in terms of the selection-biased ones: 



Crm 



Crm 
Cmm 



Cmm, 



Crr — Crr - 

Cvv = Cvv 



C 2 

C 2 



r- 

C 2 



RM 



MM 
VM 



Or 



Crv 



'MM 

CrmCvm 



r 2 

MM 



n C V M „ 
CvM = CmM, 

Cmm 

(Cmm — Cmm) , 
{Cmm — Cmm) , 

(Cmm — Cmm) ■ 



(A4) 



The intrinsic correlations with I can then be got from 
Cim = Cmm +5 Crm, 

Cir = Crm + 5 Crr, Civ = Cvm + 5 Crv, 

Cu = Cmm + 10 Crm + 25 Crr, (A5) 



with mean values 



(R) 



R 



(V) = V 



Crm 
Cmm 
Cvm 
Cmm 



(M-(M>), (I) = (M) + 5 (R), 
(M-(M)). (A6) 



Note that the quantity which is the same in the full and 
magnitude limited samples is 



Crv 



CrmCv, 
Cmm 



Crv 



Crv 



CrmCvm 
Cmm 

TRV — TrmTvm 

trv 



(A7) 



This makes intuitive sense, because the expression above is 
the part of the correlation between 7? and V which is not due 
to the individual correlations between R and M, and V and 
M. This part, i.e., the part which does not correlate with 
M, remains unchanged by the magnitude limited selection. 
Similar relations hold for Crr, Cvv, etc. 

The analysis above shows that, to account for the se- 
lection bias, all one needs is an estimate of the difference 
between the unweighted and weighted mean and variance 
of the absolute magnitude distribution (i.e. of the bias in 
the luminosity function). In the SDSS dataset of Hyde & 
Bernardi (2009b), 



M = -21.94, Cmm = 0.65, 

(M) = -20.99, and Cmm = 0.76, 



(A8) 



So, e.g., Crr < Crr and Cvv < Cvv- This illustrates a 
trivial but important point: the width of the luminosity (and 
other) distributions in a magnitude limited catalog - i.e., 
before correcting for the selection effect - may be narrower 
than in the intrinsic distribution. 

The expressions above also show that the magnitude 
limited catalog can exhibit correlations between variables 
even when there is no true intrinsic correlation. E.g, 



Civ = Civ + (Ca 



- Cmm) 



C v . 



Cmm 



5^) ; (A9) 
Cmm I 



thus, Civ even if Civ = 0. For similar reasons, absence 
of a correlation in the magnitude limited catalog does not 
imply vanishing correlation in the full sample. 

We have verified that the expressions above agree with 



measurements of the bias in mock catalogs in which there is 
no curvature in the underlying scaling relations. In practice, 
however, there is weak curvature in most scaling relations 
(e.g., Hyde & Bernardi 2009a; Bernardi et al. 2011), and 
this renders the expressions above only approximate. For 
example, Hyde & Bernardi (2009b) report that R = 0.62, 
V = 2.3 and ft = 19.71, Cu = 0.2660/2.5 2 , Crr = 0.0488, 
Cvv _= 0.0127, Cir = -0.0820/2.5, Civ = -0.0036/2.5 
and Crv = 0.0159. These are not quite the same as one ex- 
pects from the expressions above, although the differences 
can be understood in terms of how the underlying scaling 
relations curve. Nevertheless, our analysis does serve to il- 
lustrate which relations are expected to be insensitive to 
selection effects arising from a magnitude limit, and which 
are not. 



A3 (In)sensitivity to the bias 

For example, it is sometimes stated that the parameters of 
the inverse fit (equation I33p and the fit in which / is the 
dependent variable (equation I37p are not affected by the 
selection effect. The analysis above shows that this is, in 
general, not correct. However, if we ignore the selection ef- 
fect then (amv,£>inv) = (1.59,-0.716); Table [T] shows that 
the correct values are (1.606, —0.792), suggesting that a[ nv 
at least is not very biased, at least in the SDSS dataset. In 
addition, 1-1= (1.23±0.04) (V-V)- (1.07±0.02) (R- R) 
whereas the parameters from Table [T] show that I — (I) = 
1.18 (V - {V}) - 0.97 (R - (R)). For comparison, Graves 
& Faber (2010) report (1.16, -1.21), for a slightly different 
early- type galaxy sample. 

In all cases, a is not strongly affected by the magnitude 
limit. To see why, note that 



Crv 
Cvv 



Crv 1 + (CrmCvm /CmmCrv)(ACmm /Cmm) 



Cvv 
Crv 1 



1 + (C VM /CmmCvv)(ACmm /Cmm) 



5Crm /Cmm (^Cmm/Cmm) 



Cvv 



l + r 2 RV (AC M M/C M M) 



(A10) 



where we have defined AC mm = Cmm — Cmm, and the 
final expression holds in the limit Civ -+ 0, in which case 
Cvm — ► —5Crv- Now, Crm /Cmm is the slope of the size- 
absolute magnitude relation: in the SDSS, this is about 
—0.24. Similarly, ACmm/Cmm w — 1/7 and rvM ~ 0.8, so 
the net effect is to have Crv /Cvv within about ten percent 
of Crv /Cvv , making afreet ~ "direct also to within about 
ten percent. Since ai = afreet when Civ ~ 0, we expect 
a/ ~ ai, to within ten percent. A similar analysis of ai nv 
shows why it too is not strongly affected by the magnitude 
limit. 



A4 Biased estimates of the evolution of the 
zero-point 

Finally, it is worth emphasizing that, although we have fo- 
cussed on the slopes of the correlations, the fact that the 
mean values in the magnitude-limited sample differ from 
the correct values (V 7^ (V) etc.) means that the zero-points 
of the relations can be affected even if the slopes are not. 
Since the zero-point of the Fundamental Plane is often used 
as a basis for estimating evolution, this estimate must be 
made carefully in magnitude limited samples. Bernardi et 
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al. (2003) show that this effect does indeed produce a sig- 
nificant offset in the SDSS. Because we have shown how the 
mean values and slopes are affected by the magnitude limit, 
our analysis provides a straightforward way to correct for 
this effect. 

Perhaps as importantly, our analysis shows that, just 
because a scaling relation is independent of the magnitude 
limited selection effect at one redshift, there is no guarantee 
that it will remain insensitive at other z. As a specific ex- 
ample, consider the case of differential luminosity evolution. 
In the main text, we showed that if Civ ~ at z = 0, then 
Civ at z > is guaranteed. However, Civ = played 
a crucial role in the previous subsection, when we showed 
why a was insensitive to the magnitude limited selection, so 
at z > 0, this is no longer guaranteed. 
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